-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Breaking] Maintain a closed environment of GDAL-compatible types #179
Conversation
This will be part of a broader move away from GDAL's enums so that the interface for ArchGDAL is "closed" (i.e. takes and return ArchGDAL types) and not require users to import GDAL. * We need to be careful to maintain interoperability with GDAL's enums, so I've defined corresponding convert() methods, and methods to map between the different types. * I also took the opportunity to update the _FIELDTYPE dictionary now that I see how it's used (in `src/tables.jl`), and will commit the code for supporting the corresponding types as part of the same PR.
…idth=80 1. When writing the return statements, return should be at the top-level, rather than being nested inside e.g. an if-statement. * The convention of explicit returns is something that's been on my mind for a long time, and I think it's net healthier (when developing code) to be explicit about what's being returned to reduce on the number of gotchas when using it (e.g. in #171). 2. Return `nothing` upon destroying a GDAL object (rather than allowing users to encounter and depend on potentially dangerous behavior per https://www.hyrumslaw.com/) 3. The max columnwidth of 80char is somewhat arbitrary but most of the codebase is already in alignment, so I just want to bring it to 100% to reduce on the number of stylistic rules that we're still ambiguous about.
…-types or GDAL-types)
…f test error reports
(i) move constants into its own file (ii) use a macro for generating conversion methods between enums
It now returns the dataset that was copied into as the return type.
It returns a Ptr{Void}, and feels sufficiently unsafe for people to work with. (I initially thought it might return a String.) We don't have any tests for it, and I haven't seen widespread use for it through GDAL, so it's safer to deprecate this function until I've thought through how to make it accessible to users.
We also use it in the rasterio functions for safe behavior, rather than assuming that the bands all have the same pixeltype.
We should be using the interface based on https://tables.juliadata.org/stable/#Tables.AbstractRow-1, rather than overriding base methods. Along the way, I realize that the methods for getfield(feature, name) and getgeom(feature, name) do not handle the case when the name is invalid, so I've added implementations where they are missing, and return nothing/C_NULL for existing implementations. I'll use coverage results later to see where to augment with tests.
They were useful in a period of time when all GDAL objects are just Ptr{Void} and hence there was no way to distinguish between the type of GDAL object if we did not wrap the pointer in one of those constants. Ever since then, we have introduced corresponding mutable structs in types.jl that have obviated the need for using pointers directly.
This uses the enum macro in Base to define each of the GDAL enums. * I decided to maintain names with GDAL for the most part to maintain searchability for the same names. (This is a breaking change for some of the existing names, e.g. for GDALOpenFlag.) * I decided not to go with cenum because I didn't want to support having different enum values evaluate to the same UInt8. * We no longer expose internal dicts of mappings between data types. Now everything should go through base conversion. * I verified (by a search on `::GDAL.` that the only remaining GDAL.jl types in the API for this package are GDAL.GDALColorEntry, GDAL.OGREnvelope, and GDAL.OGREnvelope3D. As they are well-defined structs, I see no compelling reason to wrap them yet. * [Breaking] We now use Base conversion for mapping between the different data types, rather than having gdaltype(), datatype(), etc. * Cleaned up the implementation of rasterio!() to use GDAL.jl's functions rather than doings its own ccall.
src/raster/images.jl
Outdated
else | ||
error( | ||
""" | ||
Unsupported GPI: $gpi. Please file an issue at | ||
https://github.com/yeesian/ArchGDAL.jl/issues if it should be supported. | ||
""", | ||
) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run into this error with GPI_RGB
. In the other methods below GPI_RGB
does seem to be supported however. Is there an elseif missing here or is it more complicated than that?
using ArchGDAL
url = "https://neo.sci.gsfc.nasa.gov/archive/geotiff/CERES_INSOL_M/CERES_INSOL_M_2020-07.TIFF"
ArchGDAL.imread("/vsicurl/" * url)
ERROR: LoadError: Unsupported GPI: GPI_RGB. Please file an issue at
https://github.com/yeesian/ArchGDAL.jl/issues if it should be supported.
Stacktrace:
[1] error(s::String)
@ Base .\error.jl:33
[2] imview(gpi::ArchGDAL.GDALPaletteInterp, imgvalues::Matrix{UInt8})
@ ArchGDAL C:\Users\visser_mn\.julia\dev\ArchGDAL\src\raster\images.jl:39
[3] imread(dataset::ArchGDAL.Dataset, indices::UnitRange{Int64})
@ ArchGDAL C:\Users\visser_mn\.julia\dev\ArchGDAL\src\raster\images.jl:250
[4] imread
@ C:\Users\visser_mn\.julia\dev\ArchGDAL\src\raster\images.jl:253 [inlined]
[5] #184
@ C:\Users\visser_mn\.julia\dev\ArchGDAL\src\raster\images.jl:257 [inlined]
[6] read(f::ArchGDAL.var"#184#185", args::String; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ ArchGDAL C:\Users\visser_mn\.julia\dev\ArchGDAL\src\context.jl:267
[7] read
@ C:\Users\visser_mn\.julia\dev\ArchGDAL\src\context.jl:265 [inlined]
[8] imread(filename::String)
@ ArchGDAL C:\Users\visser_mn\.julia\dev\ArchGDAL\src\raster\images.jl:256
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I haven't learnt how to handle "single channel" matrices -- since RGB channels are typically three matrices, rather than a single matrix?
julia> using ArchGDAL
julia> url = "https://neo.sci.gsfc.nasa.gov/archive/geotiff/CERES_INSOL_M/CERES_INSOL_M_2020-07.TIFF"
"https://neo.sci.gsfc.nasa.gov/archive/geotiff/CERES_INSOL_M/CERES_INSOL_M_2020-07.TIFF"
julia> ds = ArchGDAL.read("/vsicurl/" * url)
GDAL Dataset (Driver: GTiff/GeoTIFF)
File(s):
/vsicurl/https://neo.sci.gsfc.nasa.gov/archive/geotiff/CERES_INSOL_M/CERES_INSOL_M_2020-07.TIFF
Dataset (width x height): 1440 x 720 (pixels)
Number of raster bands: 1
[GA_ReadOnly] Band 1 (Palette): 1440 x 720 (UInt8)
Looking into how it should have been handled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an elseif missing here or is it more complicated than that?
Sorry it took me this long to respond here: it's addressed in 317272f (which requires us to expose GDAL.GDALColorEntry
and continue supporting some of the colortable methods). The implementation is less than ideal, but I want to just capture the API and get things working first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the trouble: re-requesting your review for 317272f, and to confirm that the example you gave in #179 (comment) is working for you now.
@yeesian thanks I'll have to update and try it out. Also FYI this PR will be a reliable way to replace all open datasets nested anywhere in a return value object. JuliaObjects/Accessors.jl#23 It may produce errors with types that are not compatible with ConstructionBase.jl, but only in the case that they would contain a pointer to a closed file anyway, which is a worse error. This error would also be immediate rather than delayed, so you could use a I can make a PR here when it's done if that helps. To make it clearer what I mean, syntax will be something like: @modify convert_open_dataset(x) for x in returnval if x isa Union{Dataset,RasterDataset} |
@rafaqz gentle ping to follow-up on #179 (comment) |
Ok this works fine for GeoData.jl, with a few tweaks for the new default nodata value and uppercase const variables. I'm not sure about the choice to return help?> nothing
search: nothing Nothing isnothing
nothing
The singleton instance of type Nothing, used by convention when there is no value to return (as in a C void
function) or when a variable or field holds no value.
help?> missing
search: missing Missing missingval missingmask MissingException ismissing nonmissingtype skipmissing
missing
The singleton instance of type Missing representing a missing value. |
@rafaqz regarding #179 (comment), the nice thing about julia> [1 2;
missing 4] + ones(2,2)
2×2 Matrix{Union{Missing, Float64}}:
2.0 3.0
missing 5.0 In those cases where there is no nodata value, it looks like a more reasonable behavior compared to julia> [1 2;
nothing 4] + ones(2,2)
ERROR: MethodError: no method matching +(::Nothing, ::Float64)
Closest candidates are:
+(::Any, ::Any, ::Any, ::Any...) at operators.jl:560
+(::Base.TwicePrecision, ::Number) at twiceprecision.jl:267
+(::ColorTypes.AbstractGray{T} where T, ::Number) at /Users/yeesian/.julia/packages/ColorVectorSpace/71ig5/src/ColorVectorSpace.jl:285
... The fact that NCDatasets returns
I've opened #192 to make sure I systematically scan the package for NULL semantics later, but just want to make sure we're okay with going down that path first. |
I would argue the exact opposite... NCDatasets returns an array that actually contains I don't understand this example either: [1 2;
nothing 4] + ones(2,2) There are no nothing values actually in the raster when there is no nodata value. It's all numbers. |
To be clear I'm talking only about the result of : ArchGDAL.getnodatavalue(band) When there is actually not nodata in the array. Do you mean returning |
A problem with returning I had problems with files from R raster where something like this had happened and multiple missing values were present in the file, which of course broke a lot of things. I think it's better to throw an error than let this happen unintentionally. |
This is for two reasons: 1. The nodata value isn't meant to be used as a "sentinel value" in the raster output. If ArchGDAL wants to provide a convenience function for returning `Array{Union{T, Missing}}`, the treatment of missing data should be based on the appropriate use of `getmaskband()` (https://trac.osgeo.org/gdal/wiki/rfc15_nodatabitmask for details). 2. Even if ArchGDAL does provide a convenience function for returning `Array{Union{T, Missing}}`, the proper place for using `missing` values is in the array being constructed, and not the value being returned from `getnodatavalue(band)`.
…GDAL.jl into type-error-messages
Thanks for correcting my misunderstanding with #179 (comment) and #179 (comment). I see where you're coming from and agree with you now. I'll describe what I now understand (let me know if I'm still misunderstanding):
For those reasons, I've changed back from |
Not exactly... I think Also, I actually have a lot of problems with NCDatasets.jl behaviour of returning a I basically agree with everything written here in the current implementation: ArchGDAL.jl/src/raster/rasterband.jl Lines 162 to 188 in 1dc8ca8
I dont know really :) I guess even in this case you would want to know that there is actually no nodata values in the array, as opposed to ArchGDAL replacing the current nodata value with |
Fixes #196, #192, #178, #167, #155, #154, #83, #42
As it wasn't obvious what's being returned from functions, it's difficult to track what's still "leaking" GDAL data types. Therefore, there is a strong emphasis in this PR on (i) annotating all functions with return types, and (ii) getting test coverage close to 100%. In the process, I caught quite a few non-obvious bugs, and am instituting guides and processes to make it easier to catch them in the future.
For reviewers
I am sorry about making pull requests that are this big in scope, and mixing between multiple different types of changes. I encourage reading this comment in full, before proceeding to review the files changed. If it helps to know:
Changelog
Breaking
The biggest ones are:
_FIELDTYPE[fieldtype]
->convert(DataType, fieldtype)
_GDALTYPE[dtype]
->convert(GDALDataType, dtype)
High level summary:
ArchGDAL.Table
type no longer exists, since the Tables interface is now implemented directly on anyAbstractFeatureLayer
, meaning a layer will directly act as a table.pixeltype(dataset)
to be based on the widestpixeltype()
of all its bands. (Previously, it assumed that all bands have the same pixeltype, and just used the pixeltype of the first band in all therasterio!()
methods.)!
to annotate mutating functions.instead of
Detailed list of changes:
_FIELDTYPE[fieldtype]
->convert(DataType, fieldtype)
_GDALTYPE[dtype]
->convert(GDALDataType, dtype)
setfield!(feature::Feature, i::Integer, value::Vector{GDAL.GIntBig})
->setfield!(feature::Feature, i::Integer, value::Vector{Int64})
setfield!(feature::Feature, i::Integer, value::Vector{GDAL.GByte})
->setfield!(feature::Feature, i::Integer, value::Vector{UInt8})
validate(feature::Feature, flags::Integer, emiterror::Bool)
->validate(feature::Feature, flags::FieldValidation, emiterror::Bool)
copywholeraster(...)
->copywholeraster!(...)
_dataset_type(dataset)
->pixeltype(dataset)
OF_ReadOnly
->OF_READONLY
OF_Update
->OF_UPDATE
OF_Raster
->OF_RASTER
OF_Vector
->OF_VECTOR
OF_Verbose_Error
->OF_VERBOSE_ERROR
initialize!(stylemanager::StyleManager, feature::Feature)
ncolorentry(ct::ColorTable)
getcolorentry(ct::ColorTable, i::Integer)
getcolorentryasrgb(ct::ColorTable, i::Integer)
setcolorentry!(ct::ColorTable, i::Integer, entry::GDAL.GDALColorEntry)
createcolorramp!(...)
serializeJSON(rat::RasterAttrTable)
(It hasn't been tested: you can fallback onGDAL.gdalratserializejson(rat.ptr)
and open it as a feature request.)setcategorynames!(band::AbstractRasterBand, names)
(fallback:GDAL.gdalsetrastercategorynames(band.ptr, names)
)getlayer(t::Table)
(useTables.rows(t)
instead)Base.getindex(t::Table, idx::Integer)
(Only in rare circumstances is SetNextByIndex() efficiently implemented, and we're not willing to provide the expectation that it is. Rather than treating the table like an array, users should materialize the data they need into one and go from there.)Base.IteratorSize(::Type{<:Table})
Base.size(t::Table)
andBase.length(t::Table)
Base.IteratorEltype(::Type{<:Table})
Base.propertynames(t::Table)
Base.getproperty(t::Table, s::Symbol)
nextnamedtuple(layer::IFeatureLayer)
schema_names(layer::AbstractFeatureLayer)
(useschema_names(featuredefn)
instead)_JLTYPE
_GDALTYPE
(used in https://github.com/evetion/GeoArrays.jl/blob/6d7fb9bd05c3a2a2ca58129cbd5427a4bf491b92/src/io.jl#L65-L77)_FIELDTYPE
(used in https://github.com/sigmafelix/jugs/blob/master/jugs_base.jl#L73, https://github.com/evetion/GeoDataFrames.jl/blob/4ca6e1159f401e9b70b15acb3655a1a84a50d4de/src/io.jl#L7, https://github.com/cossatot/Oiler/blob/795ccbbc89b67f2b919071e2c04ff54df2d46633/src/io.jl#L153)Most users shouldn't be affected by the following changes:
GDALColorTable
(fallback is to useGDAL.GDALColorTableH
)GDALCoordTransform
(fallback is to useGDAL.OGRCoordinateTransformationH
)GDALDataset
(fallback is to useGDAL.GDALDatasetH
)GDALDriver
(fallback is to useGDAL.GDALDriverH
)GDALFeature
(fallback is to useGDAL.OGRFeatureH
)GDALFeatureDefn
(fallback is to useGDAL.OGRFeatureDefnH
)GDALFeatureLayer
(fallback is to useGDAL.OGRLayerH
)GDALField
(fallback is to useGDAL.OGRField
)GDALFieldDefn
(fallback is to useGDAL.OGRFieldDefnH
)GDALGeometry
(fallback is to useGDAL.OGRGeometryH
)GDALGeomFieldDefn
(fallback is to useGDAL.OGRGeomFieldDefnH
)GDALProgressFunc
(fallback is to useGDAL.GDALProgressFunc
)GDALRasterAttrTable
(fallback is to useGDAL.GDALRasterAttributeTableH
)GDALRasterBand
(fallback is to useGDAL.GDALRasterBandH
)GDALSpatialRef
(fallback is to useGDAL.OGRSpatialReferenceH
)GDALStyleManager
(fallback is to useGDAL.OGRStyleMgrH
)GDALStyleTable
(fallback is to useGDAL.OGRStyleTableH
)GDALStyleTool
(fallback is to useGDAL.OGRStyleToolH
)StringList
(fallback is to usePtr{Cstring}
)destroy(drv::Driver)
destroy(...)
returnsnothing
(this should have been the default behavior all along; see e.g. https://github.com/niclasmattsson/GlobalEnergyGIS/blob/2e6f87df8ad6a44de25f2a207dbabe7c0f41eca7/src/rasterize_shapefiles.jl#L99-L100)RasterDataset{T,DS}
->RasterDataset{T,DS<:AbstractDataset}
GDALRasterBand
->GDAL.GDALRasterBandH
Base.show(io::IO, object)
now returnsnothing
(although it shouldn't have been depended on, nor should it be depended on)BlockIterator -> BlockIterator{T<:Integer}
(T
being the data type of the iterator index)WindowIterator -> WindowIterator{T<:Integer}
(T
being the data type of the iterator index)BufferIterator{<: Real} -> BufferIterator{<: Real, <:Integer}
Stylistic
return
keyword should be at the top level if it is the last statement in the function.destroy(...)
will returnnothing
(breaking changes described above).gdalenum
macro to ease with the wrapping of GDAL enum types (of which there are plenty).Organizational
Remaining TODOs